Hierarchical Learning of Cross-Language Mappings through Distributed Vector Representations for Code

نویسندگان

  • Nghi D. Q. Bui
  • Lingxiao Jiang
چکیده

Translating a program written in one programming language to another can be useful for software development tasks that need functionality implementations in different languages. Although past studies have considered this problem, they may be either specific to the language grammars, or specific to certain kinds of code elements (e.g., tokens, phrases, API uses). This paper proposes a new approach to automatically learn cross-language representations for various kinds of structural code elements that may be used for program translation. Our key idea is two folded: First, we normalize and enrich code token streams with additional structural and semantic information, and train cross-language vector representations for the tokens (a.k.a. shared embeddings based on word2vec, a neural-network-based technique for producing word embeddings; Second, hierarchically from bottom up, we construct shared embeddings for code elements of higher levels of granularity (e.g., expressions, statements, methods) from the embeddings for their constituents, and then build mappings among code elements across languages based on similarities among embeddings. Our preliminary evaluations on about 40,000 Java and C# source files from 9 software projects show that our approach can automatically learn shared embeddings for various code elements in different languages and identify their cross-language mappings with reasonable Mean Average Precision scores. When compared with an existing tool for mapping library API methods, our approach identifies many more mappings accurately. The mapping results and code can be accessed at https://github.com/bdqnghi/hierarchical-programming-language-mapping. We believe that our idea for learning cross-language vector representations with code structural information can be a useful step towards automated program translation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Reasoning with Distributed Vector Representations

We demonstrate that distributed vector representations are capable of hierarchical reasoning by summing sets of vectors representing hyponyms (subordinate concepts) to yield a vector that resembles the associated hypernym (superordinate concept). These distributed vector representations constitute a potentially neurally plausible model while demonstrating a high level of performance in many dif...

متن کامل

Towards cross-lingual distributed representations without parallel text trained with adversarial autoencoders

Current approaches to learning vector representations of text that are compatible between different languages usually require some amount of parallel text, aligned at word, sentence or at least document level. We hypothesize however, that different natural languages share enough semantic structure that it should be possible, in principle, to learn compatible vector representations just by analy...

متن کامل

Wiktionary-Based Word Embeddings

Vectorial representations of words have grown remarkably popular in natural language processing and machine translation. The recent surge in deep learning-inspired methods for producing distributed representations has been widely noted even outside these fields. Existing representations are typically trained on large monolingual corpora using context-based prediction models. In this paper, we p...

متن کامل

Title: Miniature Language Learning via Mapping of Grammatical Structure to Visual Scene Structure in English and Japanese

The current research demonstrates a system that employs relatively simple learning mechanisms to construct mappings between structure in visual scenes, and the grammatical structure of sentences that describe those scenes. These learned mappings allow the system to processes natural language sentences in order to reconstruct complex internal representations of the visual scenes that those sente...

متن کامل

Evolving Distributed Representations for Language with Self-Organizing Maps

We present a neural-competitive learning model of language evolution in which several symbol sequences compete to signify a given propositional meaning. Both symbol sequences and propositional meanings are represented by high-dimensional vectors of real numbers. A neural network learns to map between the distributed representations of the symbol sequences and the distributed representations of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018